Automatic Construction of Wordnets by Using
نویسندگان
چکیده
WordNet is one of the most valuable lexical resources in the Natural Language Processing community. Unfortunately, the benefits of building a WordNet for the Macedonian language have never been recognized. Due to the time and labor intensive process of manual building of such a lexical resource, we were inspired to develop a method for its automated construction. In this paper, we present a new method for construction of non-English WordNets by using the Princeton implementation of WordNet as a backbone for their construction along with Google’s translation tool and search engine. We applied the new method for construction of the Macedonian WordNet and managed to develop a WordNet containing 17,553 words grouped into 33,276 synsets. However, the method in consideration is general and can also be applied for other languages. Finally, we report the results of an experiment using the Macedonian WordNet as a means to improve the performance of the text classification algorithms. Avtomatska izdelava wordneta z uporabo strojnega prevajanja in jezikovnega modeliranja Wordnet velja za enega najbolj uporabnih leksikalnih virov na področju računalniške obdelave naravnega jezika, vendar za makedonščino še ne obstaja. Ker je ročna izdelava tovrstnega vira izjemno dolgotrajna in draga, smo se odločili za gradnjo z avtomatskimi pristopi. V prispevku predstavljamo metodo za izdelavo wordneta v izbranem ciljnem jeziku, pri čemer izhajamo iz angleškega Prinecton WordNeta, za generiranje sinsetov pa uporabimo dvojezični slovar, Googlov spletni strojni prevajalnik in iskalnik. Čeprav je na ta način mogoče izdelati wordnet za kateri koli jezik, smo v pričujoči raziskavi generirali makedonski wordnet, ki vsebuje 17.553 besed oz. 33.265 sinsetov. Izdelan wordnet tudi preizkusimo na sistemu za avtomatsko klasifikacijo besedil in s tem preverimo njegovo uporabnost v praksi.
منابع مشابه
Enhancing Automatic Wordnet Construction Using Word Embeddings
Researchers have shown that a wordnet for a new language, possibly resource-poor, can be constructed automatically by translating wordnets of resource-rich languages. The quality of these constructed wordnets is affected by the quality of the resources used such as dictionaries and translation methods in the construction process. Recent work shows that vector representation of words (word embed...
متن کاملThe Automatic Mapping of Princeton WordNet Lexical-Conceptual Relations onto the Brazilian Portuguese WordNet Database
Princeton WordNet (WN.Pr) lexical database has motivated efficient compilations of bulky relational lexicons since its inception in the 1980 ́s. The EuroWordNet project, the first multilingual initiative built upon WN.Pr, opened up ways of building individual wordnets, and interrelating them by means of the so-called Inter-Lingual-Index, an unstructured list of the WN.Pr synsets. Other important...
متن کاملLeveraging Parallel Corpora and Existing Wordnets for Automatic Construction of the Slovene Wordnet
The paper reports on a series of experiments conducted in order to test the feasibility of automatically generating synsets for Slovene wordnet. The resources used were the multilingual parallel corpus of George Orwell’s Nineteen Eighty-Four and wordnets for several languages. First, the corpus was word-aligned to obtain multilingual lexicons and then these lexicons were compared to the wordnet...
متن کاملComplex Predicates in Indian Language Wordnets
Wordnets, which are repositories of lexical semantic knowledge containing semantically linked synsets and lexically linked words, are indispensable for work on computational linguistics and natural language processing. While building wordnets for Hindi and Marathi, two major IndoEuropean languages, we observed that the verb hierarchy in the Princeton Wordnet was rather shallow. We set to constr...
متن کاملAutomatic Construction of Persian ICT WordNet using Princeton WordNet
WordNet is a large lexical database of English language, in which, nouns, verbs, adjectives, and adverbs are grouped into sets of cognitive synonyms (synsets). Each synset expresses a distinct concept. Synsets are interlinked by both semantic and lexical relations. WordNet is essentially used for word sense disambiguation, information retrieval, and text translation. In this paper, we propose s...
متن کامل